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I. INTRODUCTION 


The possibility of applying techniques favorable to a be- 
havioral prognosis in several areas of knowledge motivates 
the use of statistics as a tool to support decision-making, 
which is justified by its ability to help in the analysis and 
interpretation of data. Thus, statistics are present in several 
studies and applications in the areas of engineering, exact 
sciences, biological sciences, and health sciences [1, 8, 10, 
13, 14, 15, 16, 17, 20, 21, 22, 26, 27, 31], given that the 
Probabilistic analysis can be understood as the study of pre- 
dicting the behavior of a single variable, or a set of variables 
in a specific scenario. Therefore, its conception consists of 
quantifying the uncertainty associated with the occurrence 
phenomenon. 


Currently, the preservation of natural systems is consid- 
ered one of the challenges of Brazilian society, with water 
being one of the environmental factors that have caused 
considerable concern among professionals working in this 
area [29]. It is known that the quality of water depends on 
the actions of man and various natural conditions, and 
knowledge of its information becomes essential for the 
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Abstract — The problem of transporting pollutants in natural rivers can be 
modeled using saline tracer injection techniques, which are very useful for 
obtaining important information related to water quality in river stretches, 
such as the physical parameter longitudinal dispersion coefficient. The 
objectives of this work are to formulate an inverse problem for the tracer 
flow process in natural rivers using a Bayesian approach to update the 
longitudinal dispersion coefficient, and use the Markov Chain Monte Carlo 
(MCMC) to solve the inverse problem formulated. 


management of water resources. Lack of knowledge of wa- 
ter quality increases uncertainties in future decision-mak- 
ing, which in turn has negative consequences in the man- 
agement of water resources [29]. 


The use of tracer injection techniques in a given location 
of the watercourse has been widely used in studies of prob- 
lems related to the transport of pollutants in natural rivers, 
to seek important information on water quality, such as, for 
example, the physical parameter longitudinal dispersion co- 
efficient. The mathematical model that describes this phys- 
ical flow process is composed of a partial differential equa- 
tion subject to the certain boundary and initial conditions. 


Several methods in the literature can be used to determine 
the longitudinal dispersion coefficient [25, 26] present in the 
mathematical model that describes the physical process of 
tracer flow in natural rivers. However, this work proposes a 
Bayesian methodology together with the Markov Chains 
Monte Carlo method as an alternative to traditional methods 
for estimating the parameter of unknown interest. 


The Bayesian methodology is based on one of the most 
important mathematical formulations of probability theory, 
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known as Bayes' theorem, which updates the information of 
the parameter of unknown interest, taking into account the 
a priori information about the parameter of interest and the 
known information about the observed sample [2, 5, 9, 15, 
231. 


To solve the inverse Bayesian problem of transporting 
pollutants in natural rivers, the Markov Chain Monte Carlo 
(MCMC) stochastic method can be used, which is based on 
specific algorithms to simulate ergodic Markov Chains 
whose stationary distribution (or equilibrium distribution) is 
the posterior probability distribution of interest [5, 9, 15, 
23]. Among the various specific algorithms used by the 
MCMC method to generate the Marvok Chains, the special 
case of the Metropolis-Hastings algorithm [13, 21] based on 
random walk [4, 5, 9, 15, 23] is used in this work, which 
proposes the new point candidate (longitudinal dispersion 
coefficient) considering the current previously simulated 
longitudinal dispersion coefficient value plus a random in- 
crement. 


The motivation of this proposal is that the Bayesian meth- 
odology together with the Markov Chain Monte Carlo 
method is an attractive and efficient statistical tool, which 
has significantly contributed to the scientific and technolog- 
ical development of several areas of knowledge, for exam- 
ple, in the estimation of the physical parameter (permeabil- 
ity) in fluid flow problems in porous media [3, 4, 5, 6, 7, 11, 
12, 15, 20]. 


In this sense, it is expected that the study presented in this 
research paper can significantly contribute to problems re- 
lated to the monitoring and preservation of natural rivers 
that receive some type of liquid waste with harmful proper- 
ties to the environment, which can cause changes in the eco- 
system and negatively impact the entire its dependent chain. 
More details on the environmental problems described 
above can be found in the literature [24, 29, 30]. 


IH. MATHEMATICAL MODELING 


Considering a rectangular domain N ER limited in 
region [0,L,] x [0, Ly in the contour 0.2, with L,.[m] and 
Ly|m] the physical dimensions of the river in the and 
directions x e y, respectively. Here c = c(x,t)[mg/l] is 
the concentration of the tracer at the point x = (x,y) in the 
instant of time t[s]. 


The mathematical model that describes the physical 
process of transporting contaminants in river with domain 
N, over a span of time I = [0,T], with t E J, is described by 
the following partial differential equation [28]: 


dar a 0 (<<) +e 0 (=) À 
ot “ox “oxlox) “ay \ay/’ (1) 
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where 1n this equation u 1s velocity river water, expressed 
in m/s; E,is longitudinal dispersion coefficient, expressed 
in m*/s; e E, é o longitudinal dispersion cross, expressed 
in m*/s. The Eq. (1) is subject to the following boundary 
conditions [28]: 


c(x,y,t) = co; x = 0,0 < y < L},,t > 0, 


ðc(x,y,t) 
— ay ie Hb VS Sly, t > 9, 
Oc(x,y,t (2) 
O op 00 oe SiO 
dy 
Oc(x, y,t 
UE = Oyly 0Sa S byt >0, 


and initial 
c(x,y,0)=c(x,y);0<x<L,0<y<L,. (3) 


In Fig. 1 we find the schematically represents the flow 
problem in geometry N, subject to boundary conditions (2). 


Oc o 
Oy ac 





Fig. 1: Discretization and boundary conditions in the 
domain. 


To solve the partial differential equation (1) subject to 
conditions (2) - (3), which governs the transport of 
pollutants in river stretches, the Finite Volume method is 
used, with implicit formulation, which performs the spatial 
and temporal integration in each volume of control v,. It, is 
the variables to be calculated are located at the center and 
borders of each v,, resulting in a linear system [18,19]. In 
the advective term, an interpolation of the type is used 
Upwind (UDS), to calculate the values of the variables of 
each border in relation to the variable located in the center 
of the volume of control [18, 19]. To solve the linear system 
resulting from the discretization of the Finite Volumes 
method, whose coefficient matrix presents a characteristic 
of a sparse matrix, the Thomas Algorithm (Tridiagonal 
Matrix Algorithm) is used, which originates from the 
Gaussian elimination method [18, 19]. 


HI. METHODOLOGY 


This section presents the formulation of the inverse prob- 
lem for the flow process of tracers in river stretches using a 
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Bayesian approach to update the parameter of interest (lon- 
gitudinal dispersion coefficient - E). The MCMC method 
used to solve the inverse problem proposed in this research 
work is presented, which in turn estimates the parameter E,. 
It is worth noting that the methodology used in this research 
paper was based on the work of [15]. 


3.1 Bayesian approach 


This section presents the Bayesian methodology that will 
be used to update the longitudinal dispersion coefficients E,, 
taking into account the a priori information on the parameter 
of interest E, and the known information about the observed 
sample. 


Based on the work of [15], it is proposed to define the set 
of observed values of the tracer concentration in fluvial 
stretches (or reference values) as: 


O(co) = {cox t); j = 1+ Ne}, (4) 


where x,denotes the location of the collection point in the 
region 1); and N; denotes the number of times the tracer con- 
centration Co is evaluated over timet;, with j = 1,---, N; (see 
Fig. 2). 









He 


Launch. > 


Fig. 2: Simplified model of the study region with launch 
and collection points. 


An update of the information of the parameter of interest 
E, is performed through the Bayes' theorem expressed by 
[15]: 


P(E,|O(c,)) « P(O(c,)|E,)P (Ey), (5) 


where P(E ) | O(co)) is posterior distribution of the 
parameter of interest Es; and the o factor P(O(c,)|E;) is the 
likelihood function, which represents the contribution of 
O(co) on the parameter E; and, in the case of this work, it is 
considered as a normal distribution expressed by [15] 
POCE) « exp [=> 


where € is the error defined by [15]: 


Nt 
E = ) estu ty) = coles ti) G 
j=1 
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where, C, (x, t;) is the simulated concentration of the tracer 
at the collection point x, for the instant of time t;; and g? is 
the accuracy associated with concentration measurements 
Cs Ca t;) and C, Ca t;). 

As E, will be proposed from a normal distribution with 
mean Up = O and variance o; = 1, then it is assumed that 
the prior distribution of the parameter of interest E, is [15]: 

2 
P(E,) = exp Ezel (8) 


20% 
3.2 Markov Chain Monte Carlo method 


In the formulation of the inverse problem presented in 
Subsection 3.1, the Markov Chain Monte Carlo (MCMC) 
method is based on stochastic simulations, which has been 
shown to be an efficient technique for solving several 
complex problems [3, 4, 5, 6, 7, 10, 11, 12, 15, 20]. The 
MCMC method uses specific algorithms to generate ergodic 
Markov Chains whose stationary distribution is the 
posterior distribution P (E ) | O (co)) 


In this work we consider the Metropolis-Hastings 
algorithm based on random walk to build the Markov 
Chains [15]: 


{E{”:k € K}, (9) 
which proposes a new candidate E, given by: 
E, = EO + (hw) * 2, (10) 


where K is the set of non-negative integers; po is the 
current state of the Markov Chain (or state of the Markov 
Chain in the instant k); A, w is the parameter that determines 
the step size of the Markov Chain; and z has a Gaussian 
distribution N (up. ož), with up =0 and of = 1. The 
probability of acceptance of the new candidate E, is given 
by [15]: 


a( EE) 
(k) 
o 7 PEO Fi) A (11) 
P (E |0(co)) a(E:\E,”) 
1,B 
where 


A = P(E |0(c.)) (EE) > 0 
and 


B = otherwise, 
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such that EO = E, with probability a(E,|E;’’) and 
ER = £9 with probability 1 — a(E,|EC). It is worth 
noting that the factor q(*) is the instrumental probability 


distribution. For a better understanding of the MCMC 
method, Algorithm 1 is presented. 


Algorithm 1: The MCMC method. 


Step 1: Set k = O and specify an initial value for the lon- 


gitudinal dispersion coefficient E o such that 


P (E |o(co)) > 0. 


Step 2: Generate a new candidate E, — q(E l |E i ac- 
cording (10). 


Step 3: Solve the tracer flow problem modeled by Eq. (1) 


subject to conditions (2) - (3). 


Step 4: Calculate the probability of acceptance of the 
new candidate E,, according (11). 


Step 5: Generate w from the uniform distribution in the 
interval [0,1], this is w ~ U(0, 1). 


Step 6: If w < a(E,|E\”), then 
E =E,, 


otherwise 


(k+1) _ p(k) 
PUSE”. 


Step 7: Increment k = k + 1, return to Step 2 and 


continue the procedure until convergence is achieved. 


IV. NUMERICAL RESULTS 
4.1 Simulation parameters for observed values 


For the construction of the set of observed values of the 
tracer concentration in fluvial stretches (or reference val- 
ues), according to Eq. (4), the parameters considered the 
most adequate to the real problem are used [28]. Thus, a sa- 
line tracer (NaCl) was used, where the mean concentration 
of salinity (NaCl) in the natural river is determined at 37 
mg/l. The launch point of the plotter is 0.7 m from the 
bank. At this point, the saline concentration became 2,551 
mg/l at the time of release. The collection point was kept 
at the same distance from the river bank, however, carried 
out 50 meters downstream. 


The domain 12 that represents the surface of the river, has 
dimensions L, and L, corresponding to 182 m and 42 m, 
respectively. This region 1s discretized with a mesh of 260 
x 60 elements, resulting 1n a total of 15,600 volumes with a 
0.7 m edge each. The simulation was parameterized with a 
maximum time of 352 seconds, and the time step used to 
solve Eq. (1) was equal to 2 seconds. 
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Finally, it is considered the speed of the river water u 
equal to 0.359 m/s; the transversal dispersion coefficient 
E, equal to 0.008 m?/s; and the longitudinal dispersion co- 
efficient E, equal to 0.33 m?/s (reference value of the co- 
efficient). 


4.2 Results and discussions 


This subsection is reserved to present the numerical re- 
sults obtained with the Markov Chain Monte Carlo method 
for solving the Bayesian inverse problem. The values of the 
simulated concentration of the tracer c, (x1, t;) were ob- 
tained using the same parameters presented in Subsection 
4.1, with the exception of the value adopted for the longitu- 
dinal dispersion coefficient. 


Was specify 10.0 m?/s for initial value for the longitu- 


dinal dispersion coefficient E e, For the parameter that de- 
termines the step size of the Markov Chain in Eq. (10) was 
used hẹ = 0.01. The value of the o? in Eq. (6) was fixed 
at 0.25 for all simulations, and the tracer concentration 


E: (x, t;) was evaluated at each two seconds of simulation. 


The numerical results were obtained from a maximum of 
10,000 proposals, with the objective of selecting 1,500 ac- 
cepted samples of the longitudinal dispersion coefficient. 
The tracer concentration c, (x1, t;) was evaluated at each 2 
seconds of simulation, with a maximum time of 352 sec- 
onds. It is observed that to reach the quantity of 1,500 ac- 
cepted samples, 8,897 proposed samples needed only, thus 
resulting in an acceptance rate of 16.85%. 


In Fig. 3 are presented the variations of the tracer concen- 
tration errors values for the 1,500 samples accepted, accord- 
ing to Eq. (7). Making a visual analysis of Fig. 3, it can be 
seen that the Markov Chain generated by the MCMC 
method converges to the stationary distribution of interest, 
which in this case it is the posterior distribution [15]. It is 
noticed that shortly after the 1,000 accepted samples, more 
precisely in the 1,184 sample, the curve reaches an stability 
zone. Thus, there is a set of 316 accepted samples of the 
longitudinal dispersion coefficient, after burn-in (1,184 ac- 
cepted samples). 


It can be observed a significant reduction in tracer con- 
centration errors, reaching values below 5 measurement 
units. This can be seen in more detail in Fig. 4, which is 
presented the zoom of stability region of Fig. 3, this is, 
quantitative samples accepted from 1184 to 1500. 


The reduced values of the simulated tracer concentration 
errors indicate that the values of the accepted longitudinal 
dispersion coefficients are close to the value of the reference 
coefficient used to generate the observed tracer concentra- 
tion at the collection point x, at each instant of time ¢;. 
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1000000 1500000 2000000 


Tracer concentration error 


500000 


0 500 1000 1500 
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Fig. 3: Tracer concentration error versus quantity of 
samples accepted. 
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Fig. 4: Zoom of Fig. 3. Quantitative samples accepted 


from 1184 to 1500. 


In Fig. 5 are presented the samples accepted of longitudi- 
nal dispersion coefficients E£}, through the MCMC method 
and Metropolis-Hastings algorithm based on random walk. 
In Fig. 6 are presented the zoom of stability region of Fig. 
5, this is, quantitative samples accepted from 1184 to 1500. 


Compared to the initial value for the longitudinal disper- 


sion coefficient E a it is observed in Fig. 5 that the values 
of the accepted coefficients decrease considerably as the 
number of accepted samples increases, reaching values 
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close to the reference coefficient after a burn-in of 1184 ac- 
cepted samples (see Fig. 6). In fact, as were mentioned ear- 
lier, this factor contributes to obtaining a reduced values of 
the simulated tracer concentration errors. Furthermore, it is 
noted in Fig. 6 that the MCMC method selects distinct lon- 
gitudinal dispersion coefficients. 


o 
T 


Longitudinal dispersion coefficient 


0 500 1000 1500 


Quantity of samples accepted 


Fig. 5: Longitudinal dispersion coefficient versus quantity 
of samples accepted. 


Longitudinal dispersion coefficient 
0.330 0.335 0.340 


0.325 


0.320 


1200 1250 1300 1350 1400 1450 1500 


Quantity ofsamples accepted 


Fig. 6: Zoom of Fig. 5. Quantitative samples accepted 
from 1180 to 1500. 


In Figs. (7) — (10) are presented the tracer concentration 
profiles at the x, position, which represents the location of 
the collection point in region Q. In these figures, the solid 
red lines correspond to the tracer concentration values ob- 
tained with the reference longitudinal dispersion coefficient 
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E, = 0.33 m2/s; the solid green lines correspond to the val- 
ues of the tracer concentrations obtained with the initial lon- 


gitudinal dispersion coefficient E = 10m’/s; and the 
solid blue lines correspond to the mean values of tracer con- 
centrations obtained for a limited set of accepted samples. 


For a better analysis of the behavior of the tracer concen- 
tration profiles, the graphs with the respective mean profiles 
were divided into groups with the quantitative of 50 (see 
Fig. 8), 150 (see Fig. 9) and 250 (see Fig. 10) accepted sam- 
ples of the longitudinal dispersion coefficient after the heat- 
ing period, and also a group of with the quantitative of 50 
samples accepted before the burn-in period (see Fig. 7). 
Thus, it becomes possible to better understand the mean re- 
sults of tracer concentrations close to the reference values 
observed at the collection point x4. 


Note that there is a significant difference between the 
tracer concentration profiles determined by the reference 
(solid red lines) and initial (solid green lines) coefficients. 
In fact, this is due to the large difference between the values 
of the reference and initial longitudinal dispersion coeffi- 
cients. 


It can be seen in Fig. 7 that the mean tracer concentration 
profile (solid lines in blue) behaves very similarly and close 
to the reference concentration values, even considering the 
mean of the last 50 samples before the burn-in period. 


However, it is observed in Figs. 8 - 10 that the MCMC 
method was able to obtain better results than those presented 
in Fig. 7. In fact, this is because the average tracer concen- 
tration profiles (solid blue lines) corresponding to simula- 
tions performed using 50, 150 and 250 samples accepted of 
the dispersion coefficient after the burn-in period. It is note- 
worthy that at the end of the heating period, the tracer con- 
centration error values are very small, compared to the er- 
rors obtained at the beginning of the Markov Chain genera- 
tion, as already observed in Figs. 3 - 4. Thus, it can be seen 
that the tracer concentration profiles represented by the 
solid red lines occupy the same coordinates of the graph. 


Therefore, based on the results presented, it is observed 
that the methodology used in this research work was effi- 
cient for the estimation of the longitudinal dispersion coef- 
ficient. 
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Fig. 7: Average of the last 50 samples before the burn-in 
period. 
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Fig. 8: Average of the first 50 samples after the burn-in 
period. 
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Fig. 9: Average of the first 150 samples after the burn-in 
period. 
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Fig. 10: Average of the first 250 samples after the burn-in 
period. 


V. CONCLUSION 


In this work, a statistical methodology was used to esti- 
mate the physical parameter longitudinal dispersion coeffi- 
cient present in the tracer flow problem in natural rivers. This 
methodology consists of a Bayesian approach to formulate 
the inverse problem associated with the tracer transport 
problem, and an application of the Monte Carlo method via 
Markov Chains to solve the inverse problem formulated. Ob- 
serving the numerical results obtained in this work, pre- 
sented in Section IV, it can be stated that the MCMC method 
through the Metropolis-Hastings algorithm based on random 
walk generated a Markov Chain that converged to the equi- 
librium (or stationary) distribution, which in this case is the 
posterior distribution of interest. After the burn-in period, the 
accepted samples of the longitudinal dispersion coefficient 
were able to reduce the errors of the tracer concentration and, 
consequently, obtain better average simulated profiles of the 
tracer concentrations at the collection point. Thus, it can be 
said that the results achieved by the MCMC were quite ex- 
pressive. This fact confirms the relevance of using statistical 
methodology to solve problems within the scope of behav- 
ioral prediction. The statistical tools used in this work were 
extremely efficient in estimating the values of the parameter 
of interest (longitudinal dispersion coefficient), which in 
turn can become a significant, respectable, useful and alter- 
native resource for estimating parameters responsible for in- 
troducing uncertainties contained in the mathematical model 
that describes the physical process of tracer flow in natural 
rivers. However, it is also necessary that this methodology 
continues to be applied and tested in other types of problems, 
as well as the experimentation of new parameters and differ- 
ent formulations for the a priori distribution. 
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